Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop stream 2024-09-12 #462

Open
wants to merge 45 commits into
base: develop
Choose a base branch
from

Conversation

NB4444
Copy link
Contributor

@NB4444 NB4444 commented Sep 12, 2024

  • Merged changes from upstream CCCL/thrust 2.4.0
  • Synchronize rocThrust with Thrust v2.4.0 .
  • Split the contents of HIPSTDPAR's forwarding header into several implementation headers.
  • Fixed --benchmark_format from not working in benchmarks.
  • Added a caching allocator to benchmarks to reduce noise and runtime.

The pipeline fails, because the version of rocPRIM used is not including this commit yet.


Closes #478.

Beanavil and others added 30 commits September 12, 2024 09:30
This prevents CCCL/thrust's build breakage because of v2.4.0 changes
@NB4444 NB4444 marked this pull request as ready for review September 12, 2024 14:21
@NB4444
Copy link
Contributor Author

NB4444 commented Sep 12, 2024

@AlexVlx any opinions about the HIPSTDPAR changes?

Copy link
Contributor

@AlexVlx AlexVlx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, thanks for doing this! There are a few nits / things I'd like to see changed though, so please consider them. Cheers!


hipError_t status;

status = rocprim::nth_element(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather we didn't do things like so, if possible, but rather simply kept the ::std:: overloads as a very thin forwarder to thrust i.e. this should just forward to thrust::nth_element(...), like the other algorithms did.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason behind forwarding to rocPRIM's algorithms is that currently rocThrust does not have support for the new Parallel STL algorithms that have been added to rocPRIM, as neither NVIDIA's Thrust nor CUB have implemented those yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be somewhat less cognitively challenging to have something like a stub thrust::__nth_element(...) call (we'd have to add the stubs, but you're already splitting into impl headers, no?) with a // TODO: this'll go away once something happens rather than having inline prim for a few algorithms, whilst most others forward to the higher level thrust interface. You'll note that the originals tried to keep away from directly doing any hip* stuff, and trying to do only C++ level handling before forwarding to Thrust. I am concerned with some problematic future, where some algorithms just doing a bunch of rocprim stuff inline because why not, some others keeping the forward to Thrust approach and, possibly, a 3rd category doing something else:) Perhaps I'm the odd person out with this preference / concern, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late answer. After discussing it a little bit internally, we are unsure whether adding stubs to rocThrust would be ideal: the interface of Thrust is based on that of the C++ Standard Library, but is not guaranteed to be exactly the same, so adding new API entries to rocThrust without knowing how Thrust would do it means that we may need to change this interface later (when Thrust adds these algorithms), which would be a breaking change.
If the only reason for adding stubs to rocThrust is removing the calls to rocPRIM from hipstdpar, then it's probably not worth the risk. Perhaps you have other worries about this that we are failing to see?

Also, about this:

I am concerned with some problematic future, where some algorithms just doing a bunch of rocprim stuff inline because why not, some others keeping the forward to Thrust approach and, possibly, a 3rd category doing something else:)

I don't think it should be a concern, as the missing algorithms are exclusively being added to rocPRIM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I don't really follow what the risk is. We're not subcontractors for the Thrust development team, so having additional interfaces wouldn't be a problem, if we were to go in that direction. Furthermore, perhaps the point is lost, but what I am suggesting is to do something along the lines of thrust::__nth_element_placeholder_stub_do_not_use_danger_will_robinson, where the latter holds the current rocprim calls. The definition for this can and should live in the implementation header. It's primarily a matter of symmetry and code consistency, no other algorithm calls Prim inline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, so there would still be calls to rocPRIM from the hipstdpar implementation headers, only that the algorithms themselves would appear as if they call to rocThrust (but they call the stubs), right? If that's the case I see no problem adding this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's it; then, when / if upstream adds these, we can just go and delete the double underscore prefix and the impl. This prevents folks (future maintainers) from having to ask themselves "should I just use PRIM here, or should I call Thrust", and at a glance makes it obvious we always call Thrust from the overloads. Thank you for taking the time to go through this, it's highly appreciated!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the changes for this! Please check if it fits your idea for the use of stubs.


hipError_t status;

status = rocprim::partial_sort(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem.


hipError_t status;

status = rocprim::partial_sort_copy(nullptr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ibidem.

* \brief <tt>Batch operations</tt> implementation detail header for HIPSTDPAR.
*/

#pragma once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing the split is fine, since it aligns with Thrust convention, I guess; however, IMHO it'd be better if the master header (this), which is the object of implicit inclusion / is part of the public interface, would do nothing but include the split headers, as opposed to having any actual functionality. We could also probably guard the interpose_allocations.hpp one with its respective macros. Finally, we probably want to error on the impl headers if they're directly included without __HIPSTDPAR__ being defined, as that would indicate user error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some questions about these suggestions:

IMHO it'd be better if the master header (this), which is the object of implicit inclusion / is part of the public interface, would do nothing but include the split headers, as opposed to having any actual functionality

Do you mean that we should further split the master headers (the ones named after the C++ Standard Library categories for algorithms)?

We could also probably guard the interpose_allocations.hpp one with its respective macros.

What macros do you refer to? IINM the interpose_allocations.hpp is already guarded by the __HIPSTDPAR__ and
__HIPSTDPAR_INTERPOSE_ALLOC__ macros (as it was before splitting hipstdpar_lib.hpp).

Finally, we probably want to error on the impl headers if they're directly included without HIPSTDPAR being defined, as that would indicate user error.

And for this, I don't fully understand why would we need to error here. Originally there was no error handling in case that macro was not defined in the hipstdpar_lib.hpp, so I'm not sure why it would be done for the impl headers. Also, as I had understood it, the forwarding header was not intended to be included directly by users, but implicitly included by the compiler when specifying the --hipstdpar option (which also triggers the definition of the __HIPSTDPAR__ macro). So if this is true (please correct me if not), then it wouldn't make sense to check for the definition of this macro IMHO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that we should further split the master headers (the ones named after the C++ Standard Library categories for algorithms)?

Not quite, what I meant is that the master header would pretty much be a list of #includes for the impl headers, with no declarations / definitions or, really, any substantive code living there. In essence, the most we'd do here is check for environmental macros / possibly declare some future config macros, if necessary. Does that make sense?

What macros do you refer to? IINM the interpose_allocations.hpp is already guarded by the __HIPSTDPAR__ and __HIPSTDPAR_INTERPOSE_ALLOC__ macros (as it was before splitting hipstdpar_lib.hpp).

What I was suggesting as a possible alternative is to check for __HIPSTDPAR_INTERPOSE_ALLOC__ in the master hipstdpar_lib.hpp header and do conditional include of the impl header in there, rather than guarding within it. This ties both to the point above re: what to do with the master header and to the one below about considering error-ing out.

And for this, I don't fully understand why would we need to error here. Originally there was no error handling in case that macro was not defined in the hipstdpar_lib.hpp, so I'm not sure why it would be done for the impl headers. Also, as I had understood it, the forwarding header was not intended to be included directly by users, but implicitly included by the compiler when specifying the --hipstdpar option (which also triggers the definition of the __HIPSTDPAR__ macro). So if this is true (please correct me if not), then it wouldn't make sense to check for the definition of this macro IMHO.

Whilst the vote of confidence is highly appreciated, I'd not take the original header to necessarily be complete / the best way to do things:) To wit, whilst the forwarding header was / is not meant to be included by users, that does not mean users won't do it (accidentally / intentionally). Error-ing out if the macro is not defined i.e. compilation for hipstdpar was not desired would guard against accidents, and will at least force intentional abusers to manually define the macro, which'd help with eventual debugging. I suggested it because I think it might be a time saver / inform users about things potentially going awry. Ultimately, it's up to you if you want to go in this direction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite, what I meant is that the master header would pretty much be a list of #includes for the impl headers, with no declarations / definitions or, really, any substantive code living there. In essence, the most we'd do here is check for environmental macros / possibly declare some future config macros, if necessary. Does that make sense?

Oh alright, I was confused because the comment was made on impl/batch.hpp. But then, the master header hipstdpar_lib.hpp is already just a macro definition check (__HIPSTDPAR__) and a bunch of includes (for the impl/ headers)

What I was suggesting as a possible alternative is to check for HIPSTDPAR_INTERPOSE_ALLOC in the master hipstdpar_lib.hpp header and do conditional include of the impl header in there, rather than guarding within it. This ties both to the point above re: what to do with the master header and to the one below about considering error-ing out.

Whilst the vote of confidence is highly appreciated, I'd not take the original header to necessarily be complete / the best way to do things:) To wit, whilst the forwarding header was / is not meant to be included by users, that does not mean users won't do it (accidentally / intentionally). Error-ing out if the macro is not defined i.e. compilation for hipstdpar was not desired would guard against accidents, and will at least force intentional abusers to manually define the macro, which'd help with eventual debugging. I suggested it because I think it might be a time saver / inform users about things potentially going awry. Ultimately, it's up to you if you want to go in this direction.

Regarding those two points, the final logic should be then:

  • Check in hipstdpar_lib.hpp if __HIPSTDPAR__ is defined and include the impl headers if so, except interpose_allocations.hpp which only will be included if __HIPSTDPAR_INTERPOSE_ALLOC__ is also defined.
  • Inside each implementation header, check if __HIPSTDPAR__ is defined and err if it's not.
  • Additionally, in interpose_allocations.hpp, check if __HIPSTDPAR_INTERPOSE_ALLOC__ is defined and err if not.

If that sounds good I'll add a commit with the changes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds great, thanks and apologies for adding work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added the err logic discussed before

README.md Outdated Show resolved Hide resolved
@AlexVlx AlexVlx requested a review from afanfa September 12, 2024 17:19
@Beanavil
Copy link
Contributor

@AlexVlx Hi! I answered your comments about the hipstdpar modifications, as I was the one responsible for those. Thanks for reviewing this :)

@Beanavil
Copy link
Contributor

@AlexVlx just to check, are we good with the latest changes done to the hipstdpar headers? Just to know if more changes are needed or we can move on with this

Copy link
Contributor

@AlexVlx AlexVlx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the hipstdpar piece, many thanks for doing this!

@stanleytsang-amd
Copy link
Collaborator

Apologies for the delay on getting the PR approved. Currently it looks like with the newer mainline images/compiler, there is a build error in rocThrust, but it's a compiler error not a rocThrust error. I will submit a bug report to the compiler team. I am able to build this a with 6.3 compiler and the tests pass in my limited testing, so I will merge this into develop for now.

@stanleytsang-amd
Copy link
Collaborator

stanleytsang-amd commented Oct 31, 2024

Actually, wait. I'm trying with a different compiler, this time from a 6.2.1 RC2 build, and im getting build errors that seem legitimate:
[ 17%] Building CXX object _deps/thread-building-blocks-build/test/CMakeFiles/conformance_arena_constraints.dir/conformance/conformance_arena_constraints.cpp.o
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:281:52: error: alias must point to a defined variable or function
281 | void *aligned_alloc(size_t alignment, size_t size) __TBB_ALIAS_ATTR_COPY(memalign);
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:74:56: note: expanded from macro '__TBB_ALIAS_ATTR_COPY'
74 | #define __TBB_ALIAS_ATTR_COPY(name) attribute((alias (#name)))
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:281:52: note: the function or variable specified in an alias must refer to its mangled name
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:74:56: note: expanded from macro '__TBB_ALIAS_ATTR_COPY'
74 | #define __TBB_ALIAS_ATTR_COPY(name) attribute((alias (#name)))
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:288:46: error: alias must point to a defined variable or function
288 | void *__libc_calloc(size_t num, size_t size) __TBB_ALIAS_ATTR_COPY(calloc);
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:74:56: note: expanded from macro '__TBB_ALIAS_ATTR_COPY'
74 | #define __TBB_ALIAS_ATTR_COPY(name) attribute((alias (#name)))
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:288:46: note: the function or variable specified in an alias must refer to its mangled name
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:74:56: note: expanded from macro '__TBB_ALIAS_ATTR_COPY'
74 | #define __TBB_ALIAS_ATTR_COPY(name) attribute((alias (#name)))
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:289:54: error: alias must point to a defined variable or function
289 | void *__libc_memalign(size_t alignment, size_t size) __TBB_ALIAS_ATTR_COPY(memalign);
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:74:56: note: expanded from macro '__TBB_ALIAS_ATTR_COPY'
74 | #define __TBB_ALIAS_ATTR_COPY(name) attribute((alias (#name)))
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:289:54: note: the function or variable specified in an alias must refer to its mangled name
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:74:56: note: expanded from macro '__TBB_ALIAS_ATTR_COPY'
74 | #define __TBB_ALIAS_ATTR_COPY(name) attribute((alias (#name)))
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:290:35: error: alias must point to a defined variable or function
290 | void *__libc_pvalloc(size_t size) __TBB_ALIAS_ATTR_COPY(pvalloc);
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:74:56: note: expanded from macro '__TBB_ALIAS_ATTR_COPY'
74 | #define __TBB_ALIAS_ATTR_COPY(name) attribute((alias (#name)))
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:290:35: note: the function or variable specified in an alias must refer to its mangled name
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:74:56: note: expanded from macro '__TBB_ALIAS_ATTR_COPY'
74 | #define __TBB_ALIAS_ATTR_COPY(name) attribute((alias (#name)))
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:291:34: error: alias must point to a defined variable or function
291 | void *__libc_valloc(size_t size) __TBB_ALIAS_ATTR_COPY(valloc);
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:74:56: note: expanded from macro '__TBB_ALIAS_ATTR_COPY'
74 | #define __TBB_ALIAS_ATTR_COPY(name) attribute((alias (#name)))
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:291:34: note: the function or variable specified in an alias must refer to its mangled name
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:74:56: note: expanded from macro '__TBB_ALIAS_ATTR_COPY'
74 | #define __TBB_ALIAS_ATTR_COPY(name) attribute((alias (#name)))
@Beanavil @NB4444 What ROCm build are you using to compile rocThrust with?

@NB4444
Copy link
Contributor Author

NB4444 commented Nov 1, 2024

I had a look at it. We do use 6.2.1 in our CI in which it works. Was able to reproduce this on a machine. This has to do with TBB that is used in the hipstdpar. Did find somewhere that it has to use libstdc++11 or newer for oneTBB to work. Was not able to find a fix. Will discuss with @Beanavil on Monday.

@stanleytsang-amd
Copy link
Collaborator

@Beanavil I'm not sure if your latest commit to disable TBB tests changes anything, I still seeing the same error while trying to build rocThrust unit tests. What does TBB_TEST control? I do not see its use anywhere. I will upload a docker image where I still see build errors.

@Beanavil
Copy link
Contributor

Beanavil commented Nov 8, 2024

Hi @stanleytsang-amd, the latest two commits do two things:

  • avoid compiling the hipstdpar tests when the C++ STL implementation used is (probably) incompatible with openTBB
  • does not allow oneTBB's (internal) tests to be built (because they are built by default), this is because in my local setup I got build errors coming from these tests that really don't need to be built in our use-case

I was also able yesterday to access the pipeline output and the docker images to reproduce the errors, I give more details about what I could debug in my answer to your email.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Issue]: Build with Clang >= 19 and libc++ fails on _VSTD macro
5 participants